Apparatus and method for Fast Hadamard Transforms

ABSTRACT

A Fast Hadamard Transform generator serially performs a Fast Hadamard Transform of a sampled signal from a first channel. The Fast Hadamard Transform generator comprises a series of stages. Each stage includes a shift register for serially receiving samples of the signal. Each stage further includes a two&#39;s complement generator for producing a two&#39;s complement of a first sample of the signal and a first multiplexer for selecting between a first sample of the signal and the two&#39;s complement of the first sample. A first adder then generates a sum of a second sample of the signal and the first sample and a difference of the second sample and the first sample and supplies the sum and the difference to the shift register of the next stage. In one embodiment the shift registers are implemented in random access memory.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority to the provisional patent application Ser. No. 60/346,143 filed on Jan. 4, 2002.

BRIEF DESCRIPTION OF THE INVENTION

[0002] This invention relates generally to digital communication systems. More particularly, the invention is directed toward a technique for serially performing a Fast Hadamard Transform (FHT).

BACKGROUND OF THE INVENTION

[0003] A Hadamard Transform is obtained by multiplying a Hadamard matrix by a vector. A Hadamard matrix is a square array of positive and negative ones whose rows (and columns) are mutually orthogonal. By mutually orthogonal is meant that the sum of the products of each element of a row (or column) with the corresponding element of another row (or column) is zero. Since the elements of a Hadamard matrix have only two possible values, the orthogonality property requires that half the elements in a row (or column) have the same value as the corresponding elements in any other row (or column) and half have the opposite value. Conversely, the sum of the products of each element of a row (or column) with the same element in the same row (or column) is equal to the number of elements in the row (or column).

[0004] The fundamental Hadamard matrix, H₂, is a 2×2 array where the subscript of H is the order of the matrix (i.e., the number of its rows and columns). In what is known as normal form, the fundamental Hadamard matrix is written so that its first row and first column contain only positive ones: $\begin{matrix} {H_{2} = \begin{bmatrix} {+ 1} & {+ 1} \\ {+ 1} & {- 1} \end{bmatrix}} & (1) \end{matrix}$

[0005] Larger Hadamard matrices are generated recursively using the recursion $\begin{matrix} {H_{2^{n}} = {H_{2} \otimes H_{2^{({n - 1})}}}} & (2) \end{matrix}$

[0006] where

is a mathematical operator known as the Kronecker product. The Kronecker product multiplies each of the elements of the matrix to the left of the

operator (i.e., the four entries in the fundamental matrix H₂) with the matrix H₂ _(^((n−1))) to the right of the

operator. Thus, in equation 2, the Kronecker product replaces each of the four entries in the fundamental matrix H₂ with the matrix H₂ _(^((n-1))) multiplied by +1 or −1 depending on the sign of the entry in the fundamental matrix. For example, $\begin{matrix} \begin{matrix} {H_{4} = {{H_{2} \otimes H_{2}} = {\begin{bmatrix} {+ 1} & {+ 1} \\ {+ 1} & {- 1} \end{bmatrix} \otimes \begin{bmatrix} {+ 1} & {+ 1} \\ {+ 1} & {- 1} \end{bmatrix}}}} \\ {= {\begin{bmatrix} {+ 1} & {+ 1} & {+ 1} & {+ 1} \\ {+ 1} & {- 1} & {+ 1} & {- 1} \\ {+ 1} & {+ 1} & {- 1} & {- 1} \\ {+ 1} & {- 1} & {- 1} & {+ 1} \end{bmatrix} = \begin{bmatrix} H_{2} & H_{2} \\ H_{2} & {- H_{2}} \end{bmatrix}}} \end{matrix} & (3) \end{matrix}$

[0007] Since the only values in the fundamental Hadamard matrix are +1 and −1, the values of any Hadamard matrix can only be +1 or -1. Frequently, other binary expressions are used in place of +1 and −1. For example, a plus sign and a minus sign can be used in place of +1 and −1, respectively; black and white squares can be used instead of +1 and −1 to provide a visual representation of the matrix; and for signaling applications +1 is replaced by a logic 0 and −1 by a logic 1.

[0008] The strict binary nature of the Hadamard matrix helps it and related mathematical expressions such as Walsh matrices find wide application in digital communications. A leading example is the CDMA cellular standard, IS-95 which uses a 64×64 Hadamard matrix, H₆₄. The H₆₄ matrix is reproduced at pages 449-450 of J. S. Lee & L. E. Miller, CDMA Systems Engineering Handbook (Artech, 1998).

[0009] Different properties of the Hadamard matrix are used in base-to-mobile (forward channel) and mobile-to-base (reverse channel) transmissions in CDMA telephony. The forward channel employs the Hadamard matrix for two purposes. First, each base station uses it to separate outbound transmissions targeted for different mobile users. Second, the base station employs it to spread the signal bandwidth of the transmission.

[0010] In the reverse channel, for every six information bits generated at the mobile radio, the mobile radio transmits one 64-bit row of the Hadamard matrix. Each such row is referred to as a Hadamard (or Walsh) sequence. The mobile radio uses the six data bits as a binary address in a lookup-table to select one of the Hadamard matrix rows, and it substitutes the 64 bits of this row for the six data bits. This action both encodes and spreads the signal, as in the forward link, but by a smaller spreading factor, 64/6=10.67.

[0011] This encoding acts primarily as a robust error-correction scheme that a mobile radio can perform efficiently and cheaply. When the base station receives the encoded signal, it uses an inverse Hadamard transform to decode the data. Because the data bits are unknown, the base station multiplies a column vector of every 64 received symbols by the entire 64×64 Hadamard matrix, H₆₄. The result is another column vector of 64 values. Mathematically, this operation is represented as the product of the Hadamard matrix H with the input vector x, yielding the Hadamard transform, y:

y=H ₆₄ x   (4)

[0012] Since the received symbols should be one row of the H₆₄ matrix, they should be orthogonal to all the rows except one of the H₆₄ matrix. Thus, all the rows except one of the resulting column vector should ideally have a zero value and the row that corresponded to the 64 received symbols should be identifiable by the presence of a non-zero value that is the sum of the absolute value of the 64 received symbols. The number of that row, represented in binary, yields the six data bits sent.

[0013] Multiplication of two matrices ordinarily requires each element of the first matrix to be multiplied by one of the elements in each column of the second matrix. Thus, when a square matrix of order N is multiplied by a column vector, the number of multiplications ordinarily required is N².

[0014] Because the Hadamard matrix contains only +1 and −1 values, however, the Hadamard transform can be computed simply by adding and subtracting the components of the column vector. Moreover, if the components of the column vector are all integers, the computation of the Hadamard transform does not require any floating point operations.

[0015] In prior art implementations of the Hadamard transform, certain symmetries of the Hadamard matrix are used in a reorganization of the computation algorithm such that the total number of functions required is reduced to log₂(N)*N. In such implementations, all N inputs of the column vector must be present before the Hadamard transform operation can be performed. Other prior art Hadamard transform implementations use parallel techniques that require storing multiple data samples prior to calculation of the transform. Computer implementations of parallel Hadamard transform engines are thus subject to high memory requirements and latency, and similarly high power consumption.

[0016] In view of the foregoing, it is highly desirable to improve the implementation of a Hadamard transform, while reducing the resources used to implement the transform.

SUMMARY OF THE INVENTION

[0017] A Fast Hadamard Transform generator according to an embodiment of the invention, serially performs a Fast Hadamard Transform of a sampled signal from a first channel. The Fast Hadamard Transform generator comprises a series of stages. Each stage includes a shift register for serially receiving samples of the signal. Each stage further includes a two's complement generator for producing a two's complement of a first sample of the signal and a first multiplexer for selecting between a first sample of the signal and the two's complement of the first sample. A first adder generates a sum of a second sample of the signal and the first sample and a difference of the second sample and first sample and supplies the sum and the difference to the shift register of the next stage. In one embodiment of the invention, the shift registers are implemented in a random access memory.

BRIEF DESCRIPTION OF THE DRAWINGS

[0018] For a better understanding of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings, in which:

[0019]FIG. 1 is a block diagram of an FHT generator for performing a Fast Hadamard Transform in parallel;

[0020]FIG. 2A is a block diagram of an FHT generator for performing a Fast Hadamard Transform in accordance with an embodiment of the invention;

[0021]FIG. 2B is a flowchart of method for generating an FHT in accordance with the invention;

[0022]FIG. 2C contains various tables showing the states of the various shift registers of FIG. 2B in accordance with an embodiment of the invention;

[0023]FIG. 2D is a flowchart of a method for generating an FHT of a first order from FHTs of a second order according to an embodiment of the invention;

[0024]FIG. 3 is a block diagram of a shift register implemented using a counter and a synchronous RAM in accordance with an embodiment of the invention;

[0025]FIG. 4 is a block diagram of an FHT generator for performing a Fast Hadamard Transform wherein shift registers are implemented with random access memories in accordance with an embodiment of the invention;

[0026]FIG. 5 is a block diagram of an FHT generator for performing a Fast Hadamard Transform implemented using FIFO and accumulator blocks in accordance with an embodiment of the invention; and

[0027]FIG. 6 is a block diagram of an accumulator block in accordance with an embodiment of the invention.

[0028] Like reference numerals refer to corresponding parts throughout the drawings.

DETAILED DESCRIPTION OF THE INVENTION

[0029]FIG. 1 is a block diagram of a parallel Fast Hadamard Transform generator 10. For purposes of illustration, FHT generator 10 is depicted as performing a Hadamard transform on an H₆₄ matrix but the invention may also be practiced on Hadamard matrices of any size. Generator 10 comprises 64 exclusive OR (XOR) gates 20-0 through 20-63 and a 64 channel accumulator 30. One input to the XOR gates is a 64 bit long serial input data signal that is applied via input lead 40 one bit at a time in parallel to one input of each of the XOR gates 20-0 through 20-63. Illustratively, this signal has been received at generator 10 after transmission over a communication channel and it is known that the signal as transmitted was one of the 64 Hadamard sequences of the H_(64 matrix. A second input to each of the XOR gates is a) 64 bit long Hadamard sequence, H_(i), with a different Hadamard sequence being applied one bit at a time in parallel to each of the 64 XOR gates. The Hadamard sequences H₀-H₆₃ are obtained on leads 50 from a 4096 bit memory 60 that stores the bit-values of the Hadamard matrix H₆₄.

[0030] The XOR gates produce output signals in accordance with the familiar exclusive OR truth table. If the two inputs to an XOR gate are the same, the output is a logic 0 and if the two inputs are different, the output is a logic 1. The outputs signals from each XOR gate are provided to accumulator 30 where the outputs of each XOR gate are summed separately, each in a different channel of the accumulator. Thus, each channel of the accumulator is associated with one XOR gate and therefore with the specific Hadamard sequence applied to that XOR gate. Since the input data signal should also be a Hadamard sequence, the two inputs to one of the 64 XOR gates should always be the same and the outputs of that XOR gate should always be logic 0. For each of the other XOR gates the two inputs should be same only half the time so that half the outputs should be logic 0 and half logic 1. Thus, upon converting the logic 0 values to an arithmetic value of 1 and the logic 1 value to an arithmetic value of −1, the accumulator will ideally accumulate in one channel a value of 64 for the output from one of the XOR gates and in the other channels a value of 0 for the outputs from all the other XOR gates. Thus, the particular Hadamard sequence that was transmitted can be identified by identifying the channel in the accumulator that stores the highest value and then ascertaining the Hadamard sequence that was supplied to the XOR gate that supplied signals to that channel. Even if the received data signal has been corrupted in transmission and is not a Hadamard sequence, the received signal should be close enough to the Hadamard sequence that was transmitted that the accumulated output from one of the XOR gates will be readily distinguished from the outputs of all the other XOR gates and will identify the Hadamard sequence that was transmitted.

[0031] The FHT generator of FIG. 1 is described as processing logic level data (ones and zeroes). However, it can be modified to accept signed binary data by replacing the XOR gates with conditional numeric complementer circuits such that when each of the signals H₀- H₆₃ is not asserted, the data 40 of FIG. 1 is passed through to the accumulator 30. When each of the signals H₀- H₆₃ is asserted, the data 40 of FIG. 1 is numerically complemented (typically two's complemented) and passed through to the accumulator 30. The width of the data accumulator 30 must process and hold will be increased from that required in the logic level case.

[0032] While FHT generator 10 of FIG. 1 is quite fast, it is hardware intensive, requiring 4K of memory to store the Hadamard matrix as well as a 64 channel accumulator. We have found that an alternative is to store the Hadamard matrix in the processing algorithm itself.

[0033]FIG. 2A is a block diagram of FHT generator 100 for serially calculating a Fast Hadamard Transform according to an embodiment of the invention. For purposes of illustration, FIG. 2A, depicts four stages for performing a Fast Hadamard Transform of size 2⁴=16. Each stage comprises a shift register 104-n, a two's complement generator 108-n, a 2:1 multiplexer 110-n and an adder 112-n, where n is the number of the stage. Each shift register 104-n has at least a first register 106-n and a last register 107-n and all but the first shift register 104-1 have additional intermediate registers. The contents of the first register 106-n of each shift register 104-n are provided as an input to adder 112-n. The contents of the last register 107-n of each shift register are provided as an input to both multiplexer 110-n and to two's complement generator 108-n. The output of the two's complement generator is provided as the second input to multiplexer 110-n. The output of multiplexer 110-n is provided as a second input to adder 112-n. A larger Fast Hadamard Transform is performed by adding more stages. For example, for IS-95 a 6 stage FHT generator is used to perform a Fast Hadamard Transform of size 2⁶=64.

[0034] The combination of the two's complement generator, the multiplexor and the adder may alternatively be implemented as an adder/subtractor, the last register 107-n of the shift register 104-n will be either subtracted from or added to the first register 106-n of the shift register 104-n.

[0035] The sequence of operation of FHT generator 100 is depicted in FIG. 2B. To be able to accept new data, each of the shift registers is clocked at step 220 so as to shift the data in each register to the next register in the shift register. At the same time, data is entered into the first register 106-1 of the first shift register 104-1. Data in the last register of each shift register is overwritten in this process.

[0036] Next, at step 222 each adder adds the contents of the first and last registers of the shift register of its stage and stores the results in the first register of the shift register of the next stage. Then, at step 224 each adder subtracts the contents of the last shift register from the contents of the first shift register of its stage and stores the results in the first register of the next stage.

[0037] The Tables of FIG. 2C illustrate the results of these operations over the first few stages of FHT generator. In particular, each column of each Table represents one of the shift registers 104-n as identified by the entry in the first row at the top of the column and every other row in a column represents the contents of one of the registers in that shift register. One of skill in the art will understand how these operations can be extended to larger sizes including six stages. To describe these operations, a data stream represented by A, B, C, and D will be used where A is input first and D is input fourth in a stream. Therefore, the input vector, X₄, is $\begin{matrix} {x_{4} = \begin{bmatrix} D \\ C \\ B \\ A \end{bmatrix}} & (5) \end{matrix}$

[0038] The fourth order FHT is then the product of the fourth order Hadamard matrix, H_(4 , of equation ()3) and the input vector, x: $\begin{matrix} \begin{matrix} {y_{4{({ABCD})}} = {H_{4}x_{4}}} \\ {= {\begin{bmatrix} {+ 1} & {+ 1} & {+ 1} & {+ 1} \\ {+ 1} & {- 1} & {+ 1} & {- 1} \\ {+ 1} & {+ 1} & {- 1} & {- 1} \\ {+ 1} & {- 1} & {- 1} & {+ 1} \end{bmatrix}\begin{bmatrix} D \\ C \\ B \\ A \end{bmatrix}}} \\ {= \begin{bmatrix} {D + C + B + A} \\ {D - C + B - A} \\ {D + C - B - A} \\ {D - C - B + A} \end{bmatrix}} \end{matrix} & (6) \end{matrix}$

[0039] Note that here the parenthetical subscript “ABCD” denotes an FHT of the data A, B, C, and D. This notation will assist in understanding the description of FIGS. 2A and 2C.

[0040] Data A first enters the first stage shift register 104-1 at input 106-1. Table 2-1 of FIG. 2C shows the state of the various registers in the circuit of FIG. 2A after entry of data A at input 106-1. As shown in column 1, the first register of stage 1 shift register 104-1 contains the data A. The contents of the other registers are unknown or uncontrolled and a lower case notation is used throughout FIG. 2C to signify this. To accept new data, the shift registers are clocked at step 220; and data is shifted to the next register. Data B enters shift register 104-1 at input 106-1 with data A being shifted to the next register in register 104-1. Thus, as shown in the first column of Table 2-2, the last register of the first stage shift register 104-1 contains the data A and the first register contains data B.

[0041] Since shift register 104-1 contains data in its first and last registers, valid FHT computations can be made. A first computation (step 222 ) is to perform the addition of the data B in the first register of the first stage shift register 104-1 with the data A in the last register of the first stage shift register 104-1, the result being B+A. The result, B+A, is then entered into a first part of the first register of shift register 104-2. A second computation (step 224 ) is to perform the addition of the data B in the first register of shift-register 104-1 with the two's-complement of the data A in the last register of shift-register 104-1, the result being B−A. To perform this computation, multiplexer 110-1 selects the output of two's complement generator 108-1, which is the negative of the contents of the last register of shift register 104-1, and supplies this output to adder 112-1; and adder 112-1 combines it with the output of the first stage of register 104-1. The result, B−A, is then entered into a second part of the first register of stage 2 shift-register 104-2. Thus, the first register of the second stage shift-register 104-2 contains the second order Fast Hadamard Transform of data A and B: $\begin{matrix} {y_{2{({AB})}} = \begin{bmatrix} {B + A} \\ {B - A} \end{bmatrix}} & (7) \end{matrix}$

[0042] and the contents of the first stage shift register and the first register of the second stage shift register are as shown in Table 2-2 of FIG. 2C.

[0043] Data C and D are subsequently entered into the first stage shift-register 104-1. When data C is shifted into the first register of shift register 104-1 (see first column of Table 2-3), the second order FHT, Y_(2(AB)) in the second stage shift register 104-2 is shifted to the second register of shift register 104-2 (see second column of Table 2-3). Computations are then performed on the first stage shift register 104-1 to generate the second order Fast Hadamard Transform of the data C and B: $\begin{matrix} {y_{2{({BC})}} = \begin{bmatrix} {C + B} \\ {C - B} \end{bmatrix}} & (8) \end{matrix}$

[0044] and store it in the first register of the second stage shift register 104-2 (see second column of Table 2-3).

[0045] When data D is shifted into the first register of the first stage shift register 104-1 (see first column of Table 2-4), the second order FHT, Y_(2(AB)), in the second register of the second stage shift register 104-2 is shifted to the last register of shift register 104-2, the second order FHT, Y_(2(BC)), in the second stage shift register 104-2 is shifted to the second register of shift register 104-2 (see second column of Table 2-4), and the computations are performed on the first stage shift register 104-1 to generate the second order FHT of the data C and D: $\begin{matrix} {y_{2{({CD})}} = \begin{bmatrix} {D + C} \\ {D - C} \end{bmatrix}} & (9) \end{matrix}$

[0046] and store it in the first register of the second stage shift register 104-2 (see second column of Table 2-4).

[0047] At this point, valid computations can be performed on the contents of the second stage shift register 104-2 that are similar to those performed on the contents of the first stage shift register 104-1. In particular, sums and differences of second order FHTs stored in first and last registers of the second stage shift register are used to generate fourth order FHTS. The fourth order FHT Y_(4(ABCD)) is therefore produced from second order FHTS Y_(2(CD)) and Y_(2(AB)) as follows: $\begin{matrix} {y_{4{({ABCD})}} = \begin{bmatrix} {y_{2{({CD})}} + y_{2{({AB})}}} \\ {y_{2{({CD})}} - y_{2{({AB})}}} \end{bmatrix}} & {~~~~~~~~~~~~~~~~~~~~} & {~~~~~~~~~~~~~~~~~} & {\left( {10a} \right)} \\ {= \begin{bmatrix} {\begin{bmatrix} {D + C} \\ {D - C} \end{bmatrix} + \begin{bmatrix} {B + A} \\ {B - A} \end{bmatrix}} \\ {\begin{bmatrix} {D + C} \\ {D - C} \end{bmatrix} - \begin{bmatrix} {B + A} \\ {B - A} \end{bmatrix}} \end{bmatrix}} & & & {\left( {10b} \right)} \\ {= \begin{bmatrix} {D + C + B + A} \\ {D - C + B - A} \\ {D + C - B - A} \\ {D - C - B + A} \end{bmatrix}} & & & {\left( {10c} \right)} \end{matrix}$

[0048] As in the first stage, difference operations can be achieved by taking the two's complement of the contents of the last register and summing with the contents of the first register. In particular, the operations are as follows. The first part of the first register of the second stage shift register 104-2, which contains the value D+C is added by adder 112-2 to the first part of the last register of shift register 104-2, which contains the value B+A and the result D+C+B+A is stored in a first part of a first register of the third stage shift register 104-3; the second part of the first register of the second stage shift register 104-2, which contains the value D−C is added by adder 112-2 to the second part of the last register of shift register 104-2, which contains the value B−A and the result D−C+B−A is stored in a second part of a first register of the third stage shift register 104-3. These add operations can be performed in parallel during a first clock cycle. In similar fashion, the contents of the last register of the stage 2 shift register are also subtracted from the first register and stored as values D+C−B−A and D−C−B+A in third and fourth parts of the first register of the third stage shift register 104-3. These difference operations can also be performed in parallel during a second clock cycle. Thus, the contents of the first, second, third and fourth parts of the first register of stage 3 are the fourth order Fast Hadamard Transform of data A, B, C and D as shown in equations 10 c and stored in the first register of the third stage shift register 104-3 (see third column of Table 2-4).

[0049] In similar fashion, Y_(4(ABCD)) can be serially shifted through to the last register of the third stage shift register 104-3 as data E, F, G and H enter FHT generator 100. As data E, F, G, and H are input to FHT generator 100 the fourth order FHTs Y_(4(BCDE)), Y_(4(CDEF)), Y_(4(DEFG)), Y_(4(EFGH)) are serially computed and shifted through the third stage shift register 104-3 (see third column of Tables 2-5 through 2-8). When data H enters FHT generator 100, the fourth order FHT of data E, F, G and H is stored in the first register of the third stage shift register 104-3 having the form (see third column of Table 2-8): $\begin{matrix} {y_{4{({EFGH})}} = \begin{bmatrix} {H + G + F + E} \\ {H - G + F - E} \\ {H + G - F - E} \\ {H - G - F + E} \end{bmatrix}} & (11) \end{matrix}$

[0050] The elements of the results Y_(4(ABCD)) and Y_(4(EFGH)) stored in the first and last registers of the third stage shift register 104-3 can then be added together and subtracted one from the other to compute an eighth order Fast Hadamard Transform of the data A, B, C, D, E, F, G and H: $\begin{matrix} {y_{8{({ABCDEFGH})}} = \begin{bmatrix} {y_{4{({EFGH})}} + y_{4{({ABCD})}}} \\ {y_{4{({EFGH})}} - y_{4{({ABCD})}}} \end{bmatrix}} & \quad \\ {y_{8{({ABCDEFGH})}} = \begin{bmatrix} {\begin{bmatrix} {H + G + F + E} \\ {H - G + F - E} \\ {H + G - F - E} \\ {H - G - F + E} \end{bmatrix} + \begin{bmatrix} {D + C + B + A} \\ {D - C + B - A} \\ {D + C - B - A} \\ {D - C - B + A} \end{bmatrix}} \\ {\begin{bmatrix} {H + G + F + E} \\ {H - G + F - E} \\ {H + G - F - E} \\ {H - G - F + E} \end{bmatrix} - \begin{bmatrix} {D + C + B + A} \\ {D - C + B - A} \\ {D + C - B - A} \\ {D - C - B + A} \end{bmatrix}} \end{bmatrix}} & \quad \\ {y_{8{({ABCDEFGH})}} = \begin{bmatrix} {H + G + F + E + D + C + B + A} \\ {H - G + F - E + D - C + B - A} \\ {H + G - F - E + D + C - B - A} \\ {H - G - F + E + D - C - B + A} \\ {H + G + F + E - D - C - B - A} \\ {H - G + F - E - D + C - B + A} \\ {H + G - F - E - D - C + B + A} \\ {H - G - F + E - D + C + B - A} \end{bmatrix}} & (12) \end{matrix}$

[0051] and this FHT is stored in the first register of the fourth stage shift register 104-4. And so on.

[0052] In particular, the structure of stages 1 and 2 can be extended to generate higher order Fast Hadamard Transforms including a 6 stage FHT generator for producing a 64^(th) order Fast Hadamard Transform. In such an embodiment, the various shift registers are configured to store the growing sets of data to be generated.

[0053]FIG. 2D summarizes this method of generating an FHT. The method of FIG. 2D is used to generate an FHT, y_(Y,m), of order y=2_(K) from FHTs of order X=2^(J). Note that the subscript m is used to keep track of the various Yth order FHTs that will be generated. According to the method, a number of FHTs, y_(X,n), of order X=2^(J) are serially stored at step 230. In an embodiment of the invention, this number is equal to S=2K−1 where Y=2^(K) is the order of the FHT being generated. Thus, for a IS-95 implementing order 64=2⁶ FHTs, it is necessary to store 11 (=2*6−1 ) FHTs of order 32=2⁵. In order to implement the order 32=2⁵ FHTs, it is necessary to store 9 (=2*5−1) FHTs of order 16=2⁴; and so on. Accordingly, the various stages 1 through 6 of an FHT generator implemented in accordance with the teachings of FIGS. 2A, 2 B and 2C would require 2, 3, 5, 7, 9 and 11 storage locations. Proceeding with the method of FIG. 2D, a sum of FHTs y_(X,p) and y_(X,q) is computed at step 232. Here, y_(X,p) and y_(X,p) are selected from among the various y_(X,n). The resulting sum is then stored at step 234. A difference between y_(X,p) and y_(X,q) is then computed at step 236. As discussed with reference to FIG. 2A, the difference operation can be performed by summing y_(X,p) with the two's complement of y_(X,q). The resulting difference is then stored at step 238. The stored sum and difference results provide an FHT, y_(Y,m) of order Y=2K having the form $\begin{matrix} {y_{Y,\quad m} = {\begin{bmatrix} {y_{X,\quad p} + y_{X,\quad q}} \\ {y_{X,\quad p} - y_{X,\quad q}} \end{bmatrix}\quad.}} & (13) \end{matrix}$

[0054] Advantageously, the method of FIG. 2D can be applied repeatedly so as to generate an FHT from successively lower order FHTs. For example, the method of FIG. 2D was described to generate FHTs of order 64=2⁶ from FHTs of order 32=2⁵. In similar fashion the method of FIG. 2D can be applied to generate FHTs of order 32=2⁵ from FHTs of order 16=2⁴. And so on for the generation of FHTs of order 16, 8, 4, and 2. Note that an FHT matrix of order 1 is a trivial matrix containing the value 1. Accordingly, an FHT of order 1 of a stream of data is the stream of data itself.

[0055] In another embodiment of the invention, the operation of the shift-registers is replaced by iteratively addressed random access memory (RAM). In this manner, data is not actually shifted but rather addressed as necessary. As shown in FIG. 3, RAM-based shift register 250 is implemented through the use of RAM 252 and counter 254. Counter 254 has N output lines connected to the address input 260 of synchronous RAM 252. Synchronous RAM 252 has 2^(N) memory locations with M-bit values for each memory location. RAM 252 is configured such that when a memory location is addressed, data at the addressed memory location is made available at data output 258. Moreover, RAM 252 is configured such that data at data input 256 is stored at the addressed memory location. The simultaneous read and write operations just described can be achieved without errors using flip-flops at data output 258. As is widely known in the art, flip-flops have the very important characteristic that a new data state can be written in at the same time that an old data state is being read out. Counter 254 is configured to cycle through a predetermined number of unique states. For example, counter 254 can be configured to cycle through all the states needed to transfer data among the 2, 3, 5, 7, 9, 11 memory locations as if the apparatus of FIG. 3 were used in a straightforward replacement of stages 1 through 6 of FHT generator 100.

[0056] Essentially, the apparatus of FIG. 3 uses a set of RAM locations as a shift register. For example, to replace shift register 104-1, RAM-based shift register 250 requires two (2) memory locations and a counter with two (2) states. As data comes into RAM-based shift register, such data is sequentially stored at one of the two memory locations. A sum of the data at the two memory locations is then computed just as at step 232 and stored at a third memory location as at step 234; and a difference of the data at the two memory locations is computed as at step 236 and stored at a fourth memory location as at step 238. Recall that in shift register 104-2 of FIG. 2, the storage elements of the shift register have a first and second parts for storing a sum and a difference. The increased memory requirements are achieved by using two sets of memory locations within RAM-based shift register 250 each having three memory locations. One set of memory locations includes the third memory location and is configured to store the sum during a first clock cycle and the second set of memory locations includes the fourth memory location and is configured to store the difference during a second clock cycle. Advantageously, the sum and difference values in the two sets of memory locations are subsequently processed in parallel. This manner of using RAMs in parallel widens the RAM output. Where one RAM output has M-bit values, two RAMs in parallel have 2*M-bit values. These principles are extended to the other stages such that four (4) sets of memory locations with 5 memory locations each are required at stage 3; eight (8) sets of memory locations with 7 memory locations each are required at stage 4; and so on. Those of skill in the art will understand that other addressing schemes as well as other configurations can be implemented to use as a RAM-based shift register.

[0057] Instead of using a single RAM-based shift register 250 to calculate the FHT, separate RAM-based shift registers can be used at each stage. FIG. 4 is a block diagram of a FHT generator 270 for performing a serial Fast Hadamard Transform using four stages of RAM-based shift registers, 204-n, where n is the number of the stage, configured to operate in the manner of RAM-based shift register 250. RAM-based shift registers 204-n are configured to store the same information as shift registers 104-n of FIG. 2 and to operate in analogous fashion and analogous elements of FIG. 4 bear the same number as the corresponding element of FIG. 2 increased by 100. Thus, in operation, FHT generator 270 receives data at input 206-1 just as FHT generator 100 of FIG. 2 received data at input 106-1; and adders 212-n generate sums and differences of the values at the first and last memory locations of each RAM based shift register 204-n.

[0058] Just as for FHT generator 100, FHT generator 270 can be extended to more stages so as to generate higher order results. Other embodiments of the invention utilize a combination of shift registers 104-n in the lower stages and RAM-based shift registers 204-n in the higher stages. For example, the memory requirements for stages 1 and 2 may be met by actual shift registers while larger-memory requirements such as for those of stages 3 and above may be met through the use of RAM.

[0059] In accordance with another embodiment of the invention, dual FHT generators can be implemented so as to share accumulators. FIG. 5 shows a dual FHT generator 300 that performs a 64^(th)-order, 10 bit Fast Hadamard Transform on two independent channels using 6 levels of additions performed in six stages of FIFO (First In, First Out) registers 304-1 through 304-6 and 305-1 through 305-6, and accumulators 312-1 through 312-6. FIFOs 304-n are configured to operate similarly to shift registers 104-n or RAM-based shift registers 204-n of FIGS. 2A and 4, respectively. Accumulators 312-n are configured to operate as adders 112-n including the function of two's complement generators 108-n of FIGS. 2A and 4. Dual FHT generator 300 performs the same functions as described for FHT generator 100 and 270 of FIGS. 2A and 4, respectively, while sharing critical hardware, importantly accumulators 312-1 through 312-6. Channel 1 input 302 provides input to channel 1 FHT generator 350 to generate channel 1 output 314; and channel 2 input 303 provides input to channel 2 FHT generator 352 to generate channel 2 output 315. The channel 1 and 2 inputs 302 and 303 are controlled such that a single accumulator block is shared between the two channels while meeting all throughput requirements. Essentially, the operation of accumulators 312-1 through 312-6 is fast enough to perform addition and subtraction operations for two separate channels. By doing so, the number of accumulators that perform add operations is reduced by 50% which provides a significant reduction in area on an integrated circuit. While hardware sharing is disclosed in FIG. 5 for two channels, the teachings of the invention can be extended to provide for additional hardware sharing for more than two channels as would be obvious to those of skill in the art.

[0060] In implementing dual FHT generator 300, four clock cycles occur for each input of data into channel 1 and 2 inputs 302 and 303. In one embodiment of the invention, add operations for channel 1 FHT generator 350 are performed in accumulators 312-1 through 312-6 during a first clock cycle and subtract operations for channel 1 FHT generator 350 are performed during a second clock cycle. During a third clock cycle, add operations for channel 2 FHT generator 352 are performed in accumulators 312-1 through 312-6 and subtract operations for channel 2 FHT generator 352 are performed during a fourth clock cycle. The add and subtract operations for a given channel are the same as were described for FIGS. 2 through 4. Sharing of accumulators 312-1 through 312-6 is accomplished by delaying the channel 2 input 303. Data on channel 2 input 303 is delayed at the input by two clock cycles using delay 310 such that accumulators 312-1 through 312-6 can operate on channel 2 data on the third and fourth clock cycles. Moreover, data on channel 1 is delayed at the output by two clock cycles using delay 311 such that channel 1 and channel 2 FHT data are output with the same timing at channel outputs 314 and 315.

[0061] A particular implementation of accumulators 312-n of FIG. 5 is shown in FIG. 6 as accumulator 500. Accumulator 500 receives signals B1 and A1 corresponding to channel 1 and also receives signals B2 and A2 corresponding to channel 2. Multiplexer 510 selects between B1 and B2 and multiplexer 512 selects between Al and A2. Accumulator select signal 514 controls which channel data is passed through multiplexers 510 and 512. In a first clock cycle when signals B1 and A1 corresponding to channel 1 are selected, subtract select line 516 is set low indicating that the accumulator is to perform an add operation. Thus, B1 data is passed to adder input 530 and A1 data is passed to adder input 532. Adder 520 then produces the sum of the two signals B1+A1 at adder output line 528. This signal is then loaded into accumulator buffer 522 upon the occurrence of a high signal at accumulator load line 524. The sum of the two signals B1+A1 is then available at accumulator output line 526.

[0062] In a second clock cycle when signals B1 and A1 are also selected, subtract select line 516 is set high indicating that the accumulator is to perform a subtract operation. The subtract operation is performed by generating the two's complement of signal B1 through the use of XOR 518 and the resulting signal is then passed to adder input 530. A1 data is passed to adder input 532. Adder 520 then produces the difference of the two signals A1+B1 at adder output line 528. This signal is then loaded into accumulator buffer 522 upon the occurrence of a high signal at accumulator load line 524. The difference of the two signals A1−B1 is then available at accumulator output line 526.

[0063] Similar add and subtract operations are performed on signals B2 and A2 corresponding to channel 2 during third and fourth clock cycles where multiplexers 510 and 512 select the second data channel. Because the FHT blocks that are shared include most of the major data path components including FIFOs and accumulators as required in an FHT, a significant reduction in hardware is achieved. During an FHT calculation, unnecessary data is discarded as the FHT proceeds, which reduces the amount of memory required for FHT calculation. When the system clock is faster than the rate at which samples arrive, which is typically the case, circuitry in one stage is preferably shared with other stages such that idle circuitry is minimized.

[0064] The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that the specific details are not required in order to practice the invention. In other instances, well known circuits and devices are shown in block diagram form in order to avoid unnecessary distraction from the underlying invention. Thus, the foregoing descriptions of specific embodiments of the present invention are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, obviously many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents. 

What is claimed is:
 1. A method for serially calculating a Fast Hadamard Transform of a first order of a sampled signal from a first channel, comprising the steps of: serially storing a plurality of Hadamard Transforms of a second order, the second order being less than the first order; computing a sum of a first Hadamard Transform and a second Hadamard Transform from the plurality of Hadamard Transforms of the second order; storing the sum; computing a difference of the first Hadamard Transform and the second Hadamard Transform from the plurality of Hadamard Transforms of the second order; and storing the difference.
 2. The method of claim 1, wherein the sampled signal contains 2^(N) samples where N is an integer.
 3. The method of claim 1, wherein the sum of the first Hadamard Transform of the second order and of the second Hadamard Transform of the second order is computed during the same clock cycle and the difference of the first Hadamard Transform of the second order and the second Hadamard Transform of the second order is computed during a second clock cycle.
 4. The method of claim 1, wherein the sum of the first Hadamard Transform of the second order and of the second Hadamard Transform of the second order is computed during the same clock cycle and the difference of the first Hadamard Transform of the second order and the second Hadamard Transform of the second order is computed during the same clock cycle.
 5. The method of claim 1, wherein the first order is of the form 2 ^(K) and the second order is of the form 2^(J), wherein K is greater than J.
 6. The method of claim 1, wherein computing the difference of the first Hadamard Transform of the second order and the second Hadamard Transform of the second order comprises the step of computing a sum of the first Hadamard Transform of the second order and a two's complements of the second Hadamard Transform of the second order.
 7. The method of claim 1, wherein Hadamard Transforms of order one (1) are samples from the sampled signal.
 8. The method of claim 1, wherein the Hadamard Transforms of the second order are serially calculated using a method comprising the steps of: serially storing a plurality of Hadamard Transforms of a third order, the third order being less than the second order; computing a sum of a first Hadamard Transform and a second Hadamard Transform from the plurality of Hadamard Transforms of the third order; storing the sum; computing a difference of the first Hadamard Transform and the second Hadamard Transform from the plurality of Hadamard Transforms of the third order; and storing the difference.
 9. The method of claim 8, wherein computing a difference of the first Hadamard Transform of the third order and the second Hadamard Transform of the third order comprises the step of computing a sum of the first Hadamard Transform of the third order and a two's complements of the second Hadamard Transform of the third order.
 10. The method of claim 8, wherein Hadamard Transforms of order one (1) are samples from the sampled signal.
 11. The method of claim 8, wherein the sampled signal contains 2^(N) samples where N is an integer.
 12. The method of claim 8, wherein the sum of the first Hadamard Transform of the third order and the second Hadamard Transform of the third order is computed during a first clock cycle and the difference of the first Hadamard Transform of the third order and the second Hadamard Transform of the third order is computed during a second clock cycle.
 13. The method of claim 8, wherein the sum of the first Hadamard Transform of the third order and the second Hadamard Transform of the third order is computed during the same clock cycle and the difference of the first Hadamard Transform of the third order and the second Hadamard Transform of the third order is computed during the same clock cycle.
 14. The method of claim 8, wherein the first order is of the form 2^(K) and the second order is of the form 2^(J) and the third order is of the form 2^(I), wherein K is greater than J which is greater than I.
 15. The method of claim 1, wherein the steps of computing and storing sums and differences are applied repeatedly to produce Fast Hadamard Transforms of successively greater order.
 16. The method of claim 15, wherein the steps of computing and storing sums and differences are first applied to samples of the sampled signal.
 17. An apparatus for serially calculating a Fast Hadamard Transform of a sampled signal from a first channel, comprising: a first shift register for serially receiving samples of the signal; a first two's complement generator for producing a two's complement of a first sample of the signal; a first multiplexer for selecting between a first sample of the signal and the two's complement of the signal to produce a multiplexer output; and a first adder for generating a sum of a second sample of the signal and the multiplexer output.
 18. The apparatus of claim 17, wherein the sampled signal contains 2^(N) samples where N is an integer.
 19. The apparatus of claim 17, wherein the first adder generates a sum of a first sample of the signal and the second sample of a signal during a first clock cycle and wherein the first adder generates a difference of a first sample of the signal and the second sample of the signal during a second clock cycle.
 20. The apparatus of claim 17, wherein the first adder generates a sum of a first sample of the signal and the second sample of a signal during the same clock cycle and wherein the first adder generates a difference of a first sample of the signal and the second sample of the signal during the same clock cycle.
 21. The apparatus of claim 20, wherein the sum and the difference are passed to a second shift register.
 22. The apparatus of claim 17, wherein the first shift register is a random access memory.
 23. The apparatus of claim 17, wherein the first shift register is a FIFO.
 24. The apparatus of claim 17, wherein the first adder is shared with a second channel.
 25. The apparatus of claim 17 further comprising: a second shift register for serially receiving the sum from the first adder; a second two's complement generator for producing a two's complement of a signal stored in the second shift register; a second multiplexer for selecting between a signal stored in the second shift register and the two's complement of said signal to produce a multiplexer output; and a second adder for generating a sum of another signal stored in the second shift register and the multiplexer output.
 26. A method for serially calculating a Fast Hadamard Transform of a sampled signal from a first channel, comprising the steps of: serially storing samples of the signal; computing a first sum of a first sample of the signal and a last sample of the signal; storing the first sum; computing a second sum of a first sample of the signal and a two's complement of the last sample of the signal; and storing the second sum.
 27. An apparatus for serially calculating a Fast Hadamard Transform of order 2^(N) of a sampled signal from a first channel, comprising: a first shift register for serially receiving Hadamard Transforms of order 2 ^(N−1); a first two's complement generator for producing a two's complement of a first Hadamard Transform of order 2^(N−1) that is stored in the first shift register; and; a first adder for generating a sum of a second Hadamard Transform of order 2^(N−1), and the first Hadamard Transform of the order 2^(N), the first adder also generating a sum of the second Hadamard Transform of order 2^(N−1) and a two's complements of the first Hadamard Transform of order 2^(N−1).
 28. The apparatus of claim 27, further comprising a first multiplexer for selecting between the first Hadamard Transform of order 2^(N−1) and the two's complements of the first Hadamard Transform of the order 2^(N).
 29. The apparatus of claim 27, further comprising a second shift register for serially receiving Hadamard Transforms of order 2^(N.)
 30. The apparatus of claim 27, wherein the sampled signal contains 2^(N) samples where N is an integer.
 31. The apparatus of claim 27, wherein the sum of the second Hadamard Transform of order 2^(N−1) and the first Hadamard Transform of order 2^(N−1) is generated during a first clock cycle and wherein the sum of the second Hadamard Transform of order 2^(N−1) and the two's complements of the first Hadamard Transform of order 2^(N−1) is generated during a second clock cycle.
 32. The apparatus of claim 27, wherein the sum of the second Hadamard Transform of order 2^(N−1) and the first Hadamard Transform of order 2^(N−1) is generated during the same clock cycle and wherein the sum of the second Hadamard Transform of order 2^(N−1) and the two's complements of the first Hadamard Transform of order 2 ^(N−1) is generated during the same clock cycle.
 33. The apparatus of claim 27, wherein the first shift register is a random access memory.
 34. The apparatus of claim 27, wherein the first shift register is a FIFO register.
 35. The apparatus of claim 27, wherein the first adder is shared with another channel.
 36. An method for serially calculating a Fast Hadamard Transform of a sampled signal from a first channel, comprising the steps of: serially storing samples of the signal; computing a first sum of a second sample of the signal and a first sample of the signal; storing the first sum; computing a second sum of the second sample of the signal and a two's complement of the first sample of the signal; and storing the second sum.
 37. The method of claim 36, further comprising storing the first and second sums in a memory.
 38. The method of claim 36, wherein the sampled signal contains 2 ^(N) samples where N is an integer.
 39. The method of claim 36, wherein the first sum is computed during a first clock cycle and the second sum is computed during a second clock cycle.
 40. The method of claim 36, wherein the first sum is computed during the same clock cycle and the second sum is computed during the same clock cycle. 